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disturbances  are  heteroscedastic ,  i.e.,  have  non-constant  variances. 
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mean  heteroscedasticity,  where  the  variances  depend  on  a  function  g 
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maximum  likelihood  estimate  is  asymptotically  more  efficient  than 

weighted  least  squares  with  known  weights.  When  g  is  unknown  the 
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1.  INTRODUCTION 


We  consider  the  following  heteroscedastic  linear  regression  model. 

The  observed  data  are  for  i*l,...,N.  Here  Y  is  a  scalar  and 

x  is  a  p-vector.  For  technical  reasons,  we  will  assume  that  (Y^,x^) 
are  independent  and  identically  distributed  according  to  the  model. 

(1.1)  Given  x.,Y.  is  distributed  with  mean  X;6  variance 

v  11  i 

2 

a  Q(x^,0)  for  some  function  Q.  The  density  of 
(Y^-x.^9) (o“Q(x^,0))  ^  is  h(»),  which  is  symmetric 
about  zero  and  continuous. 

(1.2)  The  (x.)  are  bounded,  independent  and  identically  distributed 

random  variables  possessing,  except  for  a  possible  intercept  term, 

an  absolutely  continuous  density  s(»)  with  respect  to  some  sigma- 
finite  measure  u. 

The  model  (1.1)  includes  a  wide  variety  of  special  cases,  of  which 
two  are  the  most  important.  The  first  we  shall  call  mean-heterogeneity, 
where  the  variance  is  a  continuously  differentiable  function  of  the  mean, 
i.e. , 

2  T 

(1.3)  Mean-Heterogeneity :  Var(Y^|x^)  =  a  g(x^0)  . 

The  second  special  case  is  predictor-heterogeneity,  where  the  variance 
depends  on  known  quantities  through  a  continuously  differentiable  function 
S,  i.e., 

(1.4)  Predictor  Heterogeneity:  Var(Y^|x^)  =  c  g(xp  . 

In  (1.3)  and  (1.4),  g  is  a  density  function  with  respect  to  some  o-finite 

T 

measure  on  the  support  of  x  0  and  x,  respectively. 
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It  may  appear  odd  that  g  is  assumed  to  be  a  density.  This  was  done 
that  the  general  theory  of  Begun  et  al.  (1983)  is  tnmediately 
applicable.  It  should  be  relatively  easy  to  extend  their  theory  to 
include  classes  of  nuisance  parameters  g  which  are  not  necessarily 
densities,  and  this  extension  would  be  natural  in  our  setting.  However, 
the  extra  work  would  lengthen  this  article  without  providing  any  sub¬ 
stantially  new  insights.  Since  a  can  be  adjusted  depending  on  g,  in 
practice  it  would  not  be  difficult  to  standardize  all  the  functions  g 

so  that  they  are  densities  with  respect  to  some  measure. 

The  function  g(*)  in  (1.3)- (1.4)  is  rarely  known  exactly,  while 
the  density  h(*)  in  (1.1)  is  usually  assumed  to  be  that  of  a  standard 
normal  random  variable.  When  g(*)  is  unknown  except  for  a  finite 
number  of  parameters,  a  huge  literature  can  be  employed  to  estimate  0. 
For  mean-heterogeneity,  see  Pritchard,  Downie  5  Bacon  (1977),  Jobson 
5  Fuller  (1980)  and  Carroll  8  Ruppert  (1982a),  among  others.  For 
predictor-heterogeneity,  see  Hildreth  and  Houck  (1968),  Carroll  6 
Ruppert  (1983),  and  Johansen  (1983). 

There  is  less  literature  on  estimating  0  when  the  function  g(*) 
in  (1.3)  or  (1.4)  is  unknown  and  must  be  estimated,  see  Carroll  (1982) 
for  a  theoretical  study  and  Matloff,  Rose  and  Tai  (1984)  as  well  as  an 
unpublished  report  by  Cohen,  Dalai  and  Tukey  for  empirical  studies. 


Let  0^  be  the  weighted  least  squares  estimate  with  known  weights  in 


model  (1.1),  i.e., 
N 
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9  =  (  I  x.x{/Q(x.,2.,e))‘l  l  x.Y./Qfx  ,z.,e)  . 
w  i=J  i  i  ii  1=1  1  1  11 
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Then,  under  regularity  conditions, 

(1.5)  N1/2(6  -  0)  =*>  Normal  (0,SJ,  where 

W  n 

SWJ  =  plim  o  2N  1  l  x.xT/Q(x.,z.,6)  . 

N^°  i=l  1  1  1  1 

We  shall  call  0  the  optimal  normal  theory  weighted  least  squares 
estimate  to  indicate  that  it  is  based  on  knowing  the  weights  and  assuming 
the  {Y.}  are  normally  distributed.  Neither  situation  is  likely  to  arise 
too  often  in  practice. 

For  mean -heterogeneity  (1.3),  Carroll  (1982)  showed  by  construction 
that  by  using  nonparametric  kernel  regression  estimation  of  squared 
least  squares  residuals  on  least  squares  predicted  values  x-0.  ,  an 

X  la 

A 

estimate  gN(*)  of  g ( - )  can  be  constructed  with  the  following  property. 

A 

Let  0R  be  the  weighted  least  squares  estimate  based  on  the  estimated 
weights  l/g^x^).  Then 

(1.6)  N1/2(0r  -  0)  =*►  Normal  (0,SW)  . 

Comparing  (1.5)  with  (1.6)  we  see  that  asymptotically  one  can  do  as  well 
as  the  optimal  normal  theory  weighted  least  squares  estimate  even  if  the 
variance  function  is  completely  unknown.  Carroll  (1982)  also  showed  a 
similar  result  for  a  special  class  of  predictor-heterogeneity  models. 

If  g(*)  were  known  either  exactly  or  up  to  a  finite  number  of 
parameters,  and  if  the  error  density  h(»)  in  (1.1)  is  the  normal  density, 
then  one  could  consider  the  normal  theory  maximum  likelihood  estimate 

A 

6^1 .  For  mean-heterogeneity,  there  is  information  about  0  in  the  variances 
as  well  as  the  mean,  and  Jobson  §  Fuller  (1980)  were  able  to  show  that 


i 
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the  maximum  likelihood  estimate  of  e  is  asymptotically  preferable  to 
optimal  weighted  least  squares.  More  precisely, 


(1.7)  N1/2(6>m  -  6)  =*>  Normal  (O.S^),  where  ^  . 

For  reasons  of  robustness,  we  have  some  doubts  as  to  whether  maximum 
likelihood  should  be  the  method  of  choice  for  small  samples ,  see  Carroll 
§  Ruppert  (1982b). 

In  the  predictor -heteroscedasticity  model  (1.4),  with  normally  distributed 
observations,  we  thus  know  that  it  is  possible  in  some  circumstances  to 
achieve  the  asymptotic  performance  of  optimal  weighted  least  squares 
even  when  the  variance  function  g(x)  is  conpletely  unknown.  One  purpose 
of  this  note  is  to  explore  the  generality  of  this  phenomenon.  In 
particular,  if  the  symmetric  error  density  h(»)  in  (1.1)  is  unknown  as 
well  as  the  variance  function  g(*)  in  (1.4),  we  show  that  the  information 
available  for  estimating  0  is  the  same  as  when  h(*)  and  g(*)  are  completely 
known.  We  obtain  our  results  by  applying  the  theory  of  Begun,  et  al. 

(1983)  to  models  (1.3)  and  (1.4). 

For  normally  distributed  observations  in  a  mean-heteroscedasticity 
model,  we  have  two  asymptotic  facts.  First,  it  is  possible  to  reproduce 
optimal  weighted  least  squares  even  when  the  variance  function  g(x  9) 
in  (1.3)  is  completely  unknown.  Second,  if  the  form  of  g(*)  is  known 
up  to  parameters,  at  the  normal  model  it  is  possible  to  improve  upon 
optimal  weighted  least  squares  by  using  maximum  likelihood.  Ihis  leaves 
unanswered  two  interesting  questions.  First,  if  g(*)  is  unknown  in 
model  (1.3),  is  it  possible  to  achieve  the  performance  of  maximum  like¬ 
lihood  with  known  g(*)?  Using  the  theory  of  Begun,  et  al,  we  show  that 
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the  answer  to  this  question  is  no,  which  in  retrospect  is  perhaps  not 
surprising.  Second,  if  g(*)  is  imknown  in  model  (1.3),  is  it  still 
possible  to  inprove  upon  optimal  weighted  least  squares?  We  obtain  only 
a  partial  but  perhaps  surprising  positive  answer  to  this  question. 

More  precisely,  we  show  that  if  g(*)  in  (1.3)  and  s(*)  in  (1.2)  are 
smooth,  then  the  information  available  for  estimating  8  is  more  than 
that  provided  by  weighted  least  squares. 

The  paper  is  organized  as  follows.  In  Section  2  we  outline  the 
theory  developed  by  Begun,  et  al.  In  Sections  3-5  we  apply  this  theory 
to  our  problems,  discussing  in  Section  6  the  possibility  of  constructing 
estimators  which  achieve  the  relevant  information  bounds. 


2.  THE  BEGUN,  HALL,  HUANG  5  WELLNER  THEORY 

Lower  bounds  for  estimation  in  semiparametric  models  is  an  area 
undergoing  considerable  development.  Suppose  that  * z2 *  *  * ‘ ^  are 
independent  and  identically  distributed  random  vectors  possessing  a 
density  function  f(*,0,o,g)  with  respect  to  a  sigma-finite  measure  y. 

Here  0  is  a  vector  of  parameters  of  interest,  o  is  a  vector  of  nuisance 
parameters  and  g=  ^^2*83)  are  densities  with  respect  to  sigma  finite 
measures  v^,  \>2»  v3  respectively.  Begun,  et  al.  (1983)  provide  upper 
bounds  on  the  information  available  for  estimating  0  when  (o,g)  is 
unknown.  Informally,  their  major  result  can  be  suntnarized  as  follows. 

Let  £(*,0,o,g)  be  the  logarithm  of  f(*,8,o,g)  and  let  £e,£0  be  the 
derivatives  of  the  log-likelihood  l  with  respect  to  e  and  o,  respectively. 
Define 


B  =  E£.£T ,  D*E£  ll  and  l  ~ 

00  00  0,0  0 


£„  -  BD_1£ 


o 


Let  |*|  denote  the  Euclidean  normal  and  ||*||  the  L  norm. 

i  i 


2  2 

Suppose  that  there  are  a  bounded  linear  operators  =  Lv  -*L^  for  which 


5? 


& 

£ 


I 


[2.1)  nVi!°n  -  °0I  -  V° 
.1/2  .  1/2,  .  - 


ni/z(gV-  -  gV‘)  -  S.||V--0  (i-1,2,3) 


implies  for  gn« 

E  {2  n1/2[f1/2(*,@n;an,gn)  -  f1/2(*,eo,oo,g)]/f1/2(- ,eo,oQ,g) 

-  hi  *6  -  »2  V  2  j,  (AkBk)/fl/2C-.eo.o0,g))2  *  0. 

When  this  holds,  f1/2(* ,e,o,g)  is  said  to  he  Hel linger  differentiable. 

As  discussed  by  Begun  et  al.  (1983),  £Q  and  are  the  score  functions 

for  0  and  a  and  £„  „  is  the  effective  score  for  6  when  o  is  a  nuisance 

0,0 

parameter  but  the  g^  are  known.  Begun  et  al.  have  a  small  technical  error  in 
their  remark  3.2  where  they  compute  the  "effective  score  for  0"  in 

c 

the  presence  a  R  valued  nuisance  parameter  n  ,  which  corresponds  here 
to  a  ,  and  a  density  nuisance  parameter  g.  Briefly,  the  effective 
score  for  0  is  the  part  of  score  for  0  orthogonal  to  the  subspace  spanned 
by  the  score  for  n  and  'the  score  for  g.  Begun  et  al.  compute  this  score 
by  finding  pQ  ,  the  score  for  0  in  the  presence  of  n  ,  and  then  taking 
the  part  of  Pq.^  orthogonal  to  the  space  of  scores  for  g.  Although 
convenient,  their  computational  method  is  correct  only  under  the 
condition  that  the  score  for  n  is  orthogonal  to  the  score  for  g.  In 
our  notation  this  condition  is  equivalent  to  having 


‘»7V‘ 


i*!v?v"*vV  fflv! 


(2.2) 
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Etcy  c 


I  Ak8k/f1/2)] 
k=l  K  K 


=  0 


for  all  choices  of  ^  in 
measure  vk  so  that  (2.2)  holds. 

To  continue  the  Begun  et  al.  method  of  computing  the  effective  score 
for  0  ,  if  9  is  p- dimensional ,  find  p-vectors  3*p  C*.,,  3*,  for  which 

3  3 

(2.3)  E{(£0>0  -  2  l  (AkS*k/f1/2)]  [2  £  A^/f 1/2] )  =  0 

k-1  k  1 


Bk  .  For  ease  of  computation  we  will  choose  the 


for  all  Bp  $2>  ^3  appropriate  sets  of  functions  Bp  B2,  B^.  Technically, 

2 

the  sets  Bj^  must  be  closed  subspaces  of  L  (vk)  such  that 


(2.4) 


Bk  dvR  -  0 


if  3k  £  Bk  ' 


Equation  (2.4)  would  not  be  necessary  if  we  dropped  the  requirment 
that  gk  be  a  density  with  respect  to  vk  .  However,  by  a  judicious  choice 
of  vk  (2.4)  implies  (2.2),  and  this  fact  is  at  least  a  minor  convenience. 
If  3*k  can  be  computed  and  is  an  element  of  Bk,  then  the  information 
bound  is 

(2.5)  l.-El,:,7, 

where 

(2.o)  lQ  «  y -  2  /  • 


Ihe  function  (2.6)  is  the  efficient  score. 

In  a  sense  made  precise  by  Begun,  et  al.,  for  any  regular  estimator 
of  6  for  which 

NI/2(§n  -  0)  -  N(04)  , 


(2.7) 


fS' 


the  best  one  can  hope  to  do  is  to  have  I*1,  i  .e. ,  we  must  have  that 
in  (2.7), 

(2.8)  t  >  C1  • 


3.  INFORMATION  BOUNDS  FOR  PREDICTQR-HETERQSCEDASTICITY 

Consider  the  model  defined  by  (1.1),  (1.2)  and  (1.4).  We  wish  to 
estimate  0,  with  o  unknown.  Also,  the  density  s(*)  of  x  is  unknown,  as 
is  the  symmetric  density  h(*)  of  (Y-x  0)/(og  '  (x))  and  the  variance 
function  g(*)-  We  will  show  that  the  information  bound  I*  in  (2.4)  is  exactly 
the  usual  parametric  information  I  computed  with  (g,h,sj  all  known. 

In  the  language  of  Begun,  et  al.,  this  is  a  situation  for  which  adaptation 
is  possible. 

The  density  function  for  the  predictor-heterogeneity  model  is 

(3.1)  f  =  [ag1/2(x)f 1  s(x)  h{(Y-xT6)/(og1/2(x))}  . 

Writing  r =  (Y-x^0)/( -g*/ 2 (x)) ,  it  follows  directly  that 


(3.2) 


where  fi(r)/h(r)  is  an  odd  function  of  r.  Since  B=  EUgfc  =  0,  we  have 
J.0  ,a  •  If  we  Ici 


(3.3) 


||ni/2(gi/: .  gl/2) 


n1'2^2  -  h1'2)  -  „2||,  *  0 


(3.4) 


then  one  finds  that 


2(A1e1)/f1/2  =  -81(x){r  ft(r)/h(r)  +  l}/g1/2(x)  , 

2(A282)/f1/2  =  2  82(r)/h1/2(r) 

2(A363)/f1/2  =  2  83(x)/s1/2(x)  . 

Since  r  A(x)/h(r)  is  an  even  function,  as  is  62(r)/h1//2(r) ,  we  have  that 
for  k=l,2,3 

(3.6)  EU0>o(2  Ak8k/fl/2)}  *  0  . 

Note  that  (2.2)  holds  if  E[81(X)/g1/2(x)]  =  0,  i.e.,  if 
/  61(u)/g1//2(u)  s(y)dy  *  0.  Therefore  (2.2)  is  implied  by  (2.4)  if  Vj 
is  chosen  so  that  gg(y)dVj(y)  =  sQ(y)dy  where  sQ  and  gQ  are  the  true 
values  of  the  density  parameters  s  ang  g.  It  follows  from  (2.2)  and 

(2.5)  that  8*^=0  for  k=l,2,3,  so  that  the  efficient  score  is  *  #.0  and 

(3.7)  I*  -  Ey/ 

?  T  1 

-  E{fi(r)/h(r)r  E<-y^ - 

lo2g(x)J 

Note  that  this  is  the  same  information  bound  as  for  the  purely  parametric 
case  that  g,h,s  are  all  known.  Thus,  in  principle  at  least,  asymptotically 
we  should  be  able  to  adapt  to  (g,h,s),  i.e.,  estimate  6  as  well  as  if 
(g,h,s)  were  known. 
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4.  INFORMATION  BOUNDS  FOR  NCAN-HETEROSCEDASTICITY 

The  model  is  (1.1)-(1.3).  Again  s(x)  is  the  density  of  x  and  h(») 

T  1/2  T  ?  T 

is  the  density  of  (Y-x  9)/(og  (x  9),  but  now  the  variance  of  Y  is  o  g(x  9) 

In  this  section  we  compute  the  information  bound  I*  for  estimating  6 

when  (g,h,s)  is  unknown,  and  find  that  it  is  between  the  parametric 

information  when  (g,h,s)  is  known  and  the  asymptotic  variance  of  a 

"weighted"  likelihood  estimate. 

The  density  function  is  given  by 


(4.1)  f  =  (ag1/2(xTe))'1  s (x)  hUY-x^/Cag^Cx1©))}  . 

T  1/2  T 

Again  letting  r  =  (Y-x  0)/(og  '  (x  9)),  we  find  that 


1/2,  Tc 


(4.2) 


=  -(r  h(r)/h(r)  +  1)  ■  X  ^x*el  -  e£-|) 

2  g(x10)  2  8 


x  h(r) 

'  7TO 

Considering  (3.2)- (3.5)  with  the  difference  that  now  g  is  a  function  of 
T 

x  0,  we  obtain 

2(A1£1)/f1/2  =  -B1(xT0)  {r  fi(r)/h(r)  +  1}  /g1/2(xT0) 
2(A282)/fV2  =  2  82(r)/h1/2(r) 

2(A3B3)/f1/2  =  2  63(x)/s1/2(x)  . 

The  orthogonality  condition  (3.6)  holds  for  A282  and  A3S3  but  not  for 

T 

AjBj.  Suppose  that  x  9  is  not  constant  on  the  support  of  x,  where  0 
•  1/2 

is  the  true  parameter.  Noting  that  (Aj&jj/f  is  a  function  of  the 
data  only  through  |r|  and  x  0,  it  follows  from  the  least  squares 
projection  theorem  that 


iT  v1*;  ."WCfl 

A 


(4.5)  2(A1B*1)/f /“=  EU0  a  !  x4e,  |r  j } 


«  -(r  £(r)/h(r)  +  1)  {-*  -^f9)  E(x|xT9)  - 

12  L  *  J 


12  g(xle)  “  *  J 

To  see  this,  we  must  check  two  points.  The  first  is  (2.2),  which  follows 
inmediately  upon  noting  that  A2B2  and  AjSj  depend  on  r  only  through  |r|, 
while  it  holds  for  A^S-j  since  E(r  li(r)/h(r)}  =  -1  .  As  before,  (2.2)  follows 
from  the  judicious  choice  of  .  To  see  this  let 

t(v)  =  E(x|xT0=v), 
we  see  that 

2  g1/2(v)6*(v)  =  g(v)t(v)  -  g(v)  E(x  g/g)  . 

Suppose  that  we  are  interested  in  the  information  bound  at 

(0o»Vgo'W*  Assume  that  80  is  continuous,  that  hQ,so  are  densities 

with  respect  to  measures  and  respectively,  and  that  the  random 
T 

variable  x  0Q  has  continuous  density  f  with  respect  to  Lebesgue 

measure.  Define  Vj(da)  =  (*0(a)/gQ(a))da.  Then  condition  (2.3)  will  hold. 
From  (4.3)  it  follows  that  the  efficient  score  function  is 


(4.4) 


-x  h(r) 


9,0  ogi/^(xT9)h(r) 
•  {x  -  E(x |xT6)} 


-  i(r  fi(r)/h(r)  +  1) (g(xT0)/g(xTO)) 


The  information  bound  is  then 


»T  *  * 
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(4.5)  I*  =  S*1  ♦  (1/4)  E{r  fi(r)/h(r)  +  l}2 

•  E{(g(xT0)/g(xT6))2(x  -  E(x|xT0))(x  -  E(x|xT0))T}  . 

T  1/2 

When  x  6  is  constant  over  the  support  of  x,  A^/f  is  a  function 

T 

of  the  data  only  through  |r|,  so  that  g(x  0)  =  1  by  convention  and 

L  =  '  -  tfcl  »  ^  I*  =  Sw1  • 

0,g  o  nfr)  *  w 

T 

In  effect,  for  the  homoscedastic  case  that  either  g  or  x  0  is  constant, 
the  information  bound  is  the  usual  homoscedastic  information. 


5.  INFORMATION  BOUNDS  AT  THE  NORMAL  DISTRIBUTION 


It  is  worth  noting  the  following  fact.  Suppose  we  know  h(«)  a 
priori,  e.g.,  we  assume  that  the  data  follow  a  normal  distribution. 
There  is  no  extra  information  involved  in  knowing  h(*)  exactly  if  we 
already  know  it  is  symmetrically  distributed  about  zero  as  in  (1.1). 
Thus,  even  assuming  normality,  the  information  bounds  (3.7)  and  (4.5) 
are  unchanged. 

6.  ACHIEVING  THE  INFORMATION  BOUNDS 


For  normally  distributed  observations  and  simple  linear  regression, 
the  bound  (3.7)  is  achieved  by  the  estimator  introduced  by  Carroll  (1982). 
For  mean  heteroscedasticity  in  simple  linear  regression  or  any  other  model 
where  the  map  x  -*•  x  0  is  one-to-one,  x=E(x|x  0)  so  that  the  second  term  in 
(4.5)  vanishes.  In  this  case,  an  estimator  introduced  by  Carroll  (1982)  has 
been  shown  to  achieve  the  information  bounds  when  the  data  are  normally 
distributed. 


fi. 


m 


!V 


We  now  sketch  our  reasons  for  believing  that  estimators  can  be 
constructed  which  achieve  the  information  bounds (3. 7)  and  (4.5). 

We  are  presently  working  on  finding  precise  conditions  under  which  the 
following  arguments  are  technically  correct.  Our  starting  point  is  the 
one-step  construction  used  in  Theorem  3.1  of  Bickel  (1982)  and 
generalized  somewhat  in  an  unpublished  paper  by  W.M.  Huang.  As  outlined 
by  Huang  and  Bickel  in  his  conditions  GR(iv)  and  H' ,  the  three  essential 
steps  are  (1)  that  a  root-N  consistent  estimator  of  9  exists;  (2) 
that  consistent  estimates  of  the  information  bound  (3.7)  or  (4.5) 
exist;  (3)  we  can  consistently  estimate  the  optimal  score  function 
(3.2)  or  (4.4)  rather  well.  The  first  step  is  easy,  because  least 
squares  and  simple  M-estimates  are  already  root-N  consistent.  For 
predictor-heteroscedasticity,  the  second  and  third  steps  should  not 
be  too  hard  to  verify  by  using  the  kernel  estimate  of  g(*)  proposed 
by  Carroll  (1982)  and  a  kernel  estimate  of  fi/h  as  in,  for  exanple. 

Lemma  4.1  of  Bickel  (1982).  The  second  and  third  steps  should  hold 
for  mean-heteroscedasticity  as  well,  but  are  likely  to  be  much  harder 
technically.  The  reason  is  that  in  (4.4)  and  (4.5),  we  need  to  estimate 
not  only  g  and  fi/h,  but  also  g/g  and  E(xix4e). 

If  the  distribution  of  {x^1  is  discrete  rather  than  continuous 
as  assumed  in  (1.2),  there  are  problems  of  identifiability  since  many 
different  functions  g  will  fit  the  variance  function  at  each  support 
point  x.  Our  belief  is  that,  for  this  case,  no  real  asymptotic  improve¬ 
ment  will  be  possible  over  ordinary  weighted  least  squares  unless  the 
function  g  is  more  tightly  specified. 


The  issue  of  robustness  raised  by  Carroll  $  Ruppert  (1982b)  is 
still  an  important  one.  Generalized  least  squares,  which  has  asymptotic 
variance  Sw,  is  typically  rather  robust  against  small  deviations  in  the 
model,  e.g.,  the  variance  not  quite  a  function  of  the  mean,  say  but 
rather  depending  on  x  in  a  slightly  different  fashion.  Our  guess  is  that 
the  same  cannot  be  said  of  any  estimator  achieving  the  information  bound. 

We  have  calculated  the  possible  asymptotic  improvement  over  generalized 
least  squares  for  normally  distributed  data  design  and  means  of  Jobson  § 
Fuller  (1980)  for  the  special  model 

E  Y,  =  xJe 

VarfYj)  -  o2(xiT9)“ . 

The  improvement  tended  to  be  monotonically  increasing  in  the  coefficient 
of  variation,  becoming  noticable  only  when  the  average  coefficient  of 
variation  exceeded  0.40.  In  our  experience,  nearly  normally  distributed 
heteroscedastic  data  typically  have  average  coefficients  of  variation  not 
exceeding  0.30  .  While  our  calculations  are  too  fragmentary  to  make  any 
general  conclusions,  they  do  suggest  that  when  the  form  of  the  variance 
function  is  unknown,  the  simple  smoothing  techniques  of  Carroll  (1982) 
will  often  be  nearly  asymptotically  efficient. 
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Comparisons  based  on  |$|  of  various  asymptotic  covariances  I 


Form  of 
Variance 

Generalized  Function 


